A Tweets Classifier based on Cosine Similarity
نویسندگان
چکیده
The 2017 Microblog Cultural Contextualization task consists in three challenges: (1) Content Analysis, (2) Microblog search, and (3) TimeLine illustration. This paper describes the use of cosine similarity, which is characterized by the comparison of similarity between two vectors of an inner product space. This research used two approaches: (1) word2vec and (2) Bag-of-Words (BoW) for extracting all relevant tweets to each event related to the four festivals: Charrues, Transmusicales, Avignon and Edinburgh.
منابع مشابه
Detecting Newsworthy Topics in Twitter
The task of the SNOW 2014 Data Challenge is to mine Twitter streams to provide journalists a set of headlines and complementary information that summarize the most newsworthy topics for a number of given time intervals. We propose a 4-step approach to solve this. First, a classifier is trained to determine whether a Twitter user is likely to post tweets about newsworthy stories. Second, tweets ...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملKELabTeam: A Statistical Approach on Figurative Language Sentiment Analysis in Twitter
In this paper, we propose a new statistical method for sentiment analysis of figurative language within short texts collected from Twitter (called tweets) as a part of SemEval2015 Task 11. Particularly, the proposed model focuses on classifying the tweets into three categories (i.e., sarcastic, ironic, and metaphorical tweet) by extracting two main features (i.e., term features and emotion patt...
متن کاملLSIS at SemEval-2017 Task 4: Using Adapted Sentiment Similarity Seed Words For English and Arabic Tweet Polarity Classification
We present, in this paper, our contribution in SemEval2017 task 4 : ”Sentiment Analysis in Twitter”, subtask A: ”Message Polarity Classification”, for English and Arabic languages. Our system is based on a list of sentiment seed words adapted for tweets. The sentiment relations between seed words and other terms are captured by cosine similarity between the word embedding representations (word2...
متن کاملSummarizing Disaster Related Event from Microblog
The Information Retrieval Lab at DA-IICT India participated in text summarization of the Data Challenge track of SMERP 2017. SMERP 2017 track organizers have provided the Italy earthquake tweet dataset along with the set of topics which describe important information required during any disaster related incident. The main goal of this task is to gather how well the participant’s system summariz...
متن کامل